22 research outputs found
Towards End-to-end Unsupervised Speech Recognition
Unsupervised speech recognition has shown great potential to make Automatic
Speech Recognition (ASR) systems accessible to every language. However,
existing methods still heavily rely on hand-crafted pre-processing. Similar to
the trend of making supervised speech recognition end-to-end, we introduce
\wvu~which does away with all audio-side pre-processing and improves accuracy
through better architecture. In addition, we introduce an auxiliary
self-supervised objective that ties model predictions back to the input.
Experiments show that \wvu~improves unsupervised recognition results across
different languages while being conceptually simpler.Comment: Preprin